Elkhound: A Fast, Practical GLR Parser Generator
نویسندگان
چکیده
The Generalized LR (GLR) parsing algorithm is attractive for use in parsing programming languages because it is asymptotically efficient for typical grammars, and can parse with any context-free grammar, including ambiguous grammars. However, adoption of GLR has been slowed by high constant-factor overheads and the lack of a general, user-defined action interface. In this paper we present algorithmic and implementation enhancements to GLR to solve these problems. First, we present a hybrid algorithm that chooses between GLR and ordinary LR on a token-by-token basis, thus achieving competitive performance for determinstic input fragments. Second, we describe a design for an action interface and a new worklist algorithm that can guarantee bottom-up execution of actions for acyclic grammars. These ideas are implemented in the Elkhound GLR parser generator. To demonstrate the effectiveness of these techniques, we describe our experience using Elkhound to write a parser for C++, a language notorious for being difficult to parse. Our C++ parser is small (3500 lines), efficient and maintainable, employing a range of disambiguation strategies.
منابع مشابه
A Fast General Parser for Automatic Code Generation
The code generator in a compiler attempts to match a subject tree against a collection of tree-shaped patterns for generating instructions. Tree-pattern matching may be considered as a generalization of string parsing. We propose a new generalized LR (GLR) parser, which extends the LR parser stack with a parser cactus. GLR explores all plausible parsing steps to find the least-cost matching. GL...
متن کاملUGLR Parser for Phrase Structure Languages as an Extension of GLR Parser
This paper proposes the UGLR parser as an extension of the GLR parser. A UGLR parser is powerful enough to parse deterministically any phrase structure language if it is in the class of recursive languages and can parse any context free language as fast as the conventional GLR parser. Natural language processing often requires a parser for languages belonging to classes larger than that of cont...
متن کاملPAPAGENO: A Parallel Parser Generator for Operator Precedence Grammars
In almost all language processing applications, languages are parsed employing classical algorithms (such as the LR(1) parsers generated by Bison), which are sequential due to their left-to-right state-dependent nature. Although early theoretical studies on parallel parsing algorithms delineated potential speedups on abstract parallel machines using a data-parallel approach, practical developme...
متن کاملHASDF: A Generalized LR-parser Generator for Haskell
Language-centered software engineering requires language technology that (i) handles the full class of context-free grammars, and (ii) accepts grammars that contain syntactic information only. The syntax definition formalism SDF combined with GLR-parser generation offers such technology. We propose to make SDF and GLR-parsing available for use with various programming languages. We have done so...
متن کاملPrincipled Parsing for Indentation-Sensitive Languages
Many languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Because context-free grammars are not able to express these layout rules, existing parsers use ad hoc techniques to handle them. These techniques tend to be low-level and operational in nature, and thus forgo the advantages of more declarative specifications like context-free grammar...
متن کامل